{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Convert data to NILMTK format and load into NILMTK"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "NILMTK uses an open file format based on the HDF5 binary file format to store both the power data and the metadata.  The very first step when using NILMTK is to convert your dataset to the NILMTK HDF5 file format."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**NOTE**: If you are on Windows, remember to escape the back-slashes, use forward-slashs, or use raw-strings when passing paths in Python, e.g. one of the following would work:\n",
    "\n",
    "```python\n",
    "convert_redd('c:\\\\data\\\\REDD\\\\low_freq', r'c:\\\\data\\\\REDD\\\\redd.h5')\n",
    "convert_redd('c:/data/REDD/low_freq', 'c:/data/REDD/redd.h5')\n",
    "convert_redd(r'c:\\data\\REDD\\low_freq', r'c:\\data\\REDD\\redd.h5')\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## REDD"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Converting the REDD dataset is easy:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loading house 1... 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 \n",
      "Loading house 2... 1 2 3 4 5 6 7 8 9 10 11 \n",
      "Loading house 3... 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 \n",
      "Loading house 4... 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 \n",
      "Loading house 5... 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 \n",
      "Loading house 6... 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 \n",
      "Loaded metadata\n",
      "Done converting YAML metadata to HDF5!\n",
      "Done converting REDD to HDF5!\n"
     ]
    }
   ],
   "source": [
    "from nilmtk.dataset_converters import convert_redd\n",
    "convert_redd('/data/REDD/low_freq', '/data/REDD/redd.h5')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now `redd.h5` holds all the REDD power data and all the relevant metadata.  In NILMTK v0.2 this conversion only uses a tiny fraction of the system memory (unlike NILMTK v0.1 which would guzzle ~1 GByte of RAM just to do the dataset conversion!).\n",
    "\n",
    "Of course, if you want to run `convert_redd` on your own machine then you first need to download [REDD](http://redd.csail.mit.edu), decompress it and pass the relevant `source_directory` and `output_filename` to `convert_redd()`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Other datasets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "At the time of writing, [NILMTK contains converters for 8 datasets](https://github.com/nilmtk/nilmtk/tree/master/nilmtk/dataset_converters).\n",
    "\n",
    "Contributing a new converter is easy and highly encouraged!  [Learn how to write a dataset converter](https://github.com/nilmtk/nilmtk/blob/master/docs/manual/development_guide/writing_a_dataset_converter.md)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Open HDF5 in NILMTK"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from nilmtk import DataSet\n",
    "from nilmtk.utils import print_dict\n",
    "\n",
    "redd = DataSet('/data/REDD/redd.h5')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "At this point, all the metadata has been loaded into memory but none of the power data has been loaded.  This is our first encounter with a fundamental difference between NILMTK v0.1 and v0.2:  NILMTK v0.1 used to eagerly load the entire dataset into memory before you did any actual work on the data.  NILMTK v0.2 is lazy!  It won't load data into memory until you tell it what you want to do with the data (and, even then, large dataset will be loaded in chunks that fit into memory).  This allows NILMTK v0.2 to work with arbitrarily large datasets (datasets too large to fit into memory) without choking your system."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exploring the `DataSet` object"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's have a quick poke around to see what's in this `redd` object...\n",
    "\n",
    "There is a lot of metadata associated with the dataset, including information about the two models of meter device the authors used to record REDD:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<ul><li><strong>name</strong>: REDD</li><li><strong>long_name</strong>: The Reference Energy Disaggregation Data set</li><li><strong>creators</strong>: <ul><li>Kolter, Zico</li><li>Johnson, Matthew</li></ul></li><li><strong>publication_date</strong>: 2011</li><li><strong>institution</strong>: Massachusetts Institute of Technology (MIT)</li><li><strong>contact</strong>: zkolter@cs.cmu.edu</li><li><strong>description</strong>: Several weeks of power data for 6 different homes.</li><li><strong>subject</strong>: Disaggregated power demand from domestic buildings.</li><li><strong>number_of_buildings</strong>: 6</li><li><strong>timezone</strong>: US/Eastern</li><li><strong>geo_location</strong>: <ul><li><strong>locality</strong>: Massachusetts</li><li><strong>country</strong>: US</li><li><strong>latitude</strong>: 42.360091</li><li><strong>longitude</strong>: -71.09416</li></ul></li><li><strong>related_documents</strong>: <ul><li><a href=\"http://redd.csail.mit.edu\">http://redd.csail.mit.edu</a></li><li><a href=\"J. Zico Kolter and Matthew J. Johnson. REDD: A public data set for energy disaggregation research. In proceedings of the SustKDD workshop on Data Mining Applications in Sustainability, 2011. http://redd.csail.mit.edu/kolter-kddsust11.pdf\n",
       "\">J. Zico Kolter and Matthew J. Johnson. REDD: A public data set for energy disaggregation research. In proceedings of the SustKDD workshop on Data Mining Applications in Sustainability, 2011. http://redd.csail.mit.edu/kolter-kddsust11.pdf\n",
       "</a></li></ul></li><li><strong>schema</strong>: <a href=\"https://github.com/nilmtk/nilm_metadata/tree/v0.2\">https://github.com/nilmtk/nilm_metadata/tree/v0.2</a></li><li><strong>meter_devices</strong>: <ul><li><strong>eMonitor</strong>: <ul><li><strong>model</strong>: eMonitor</li><li><strong>manufacturer</strong>: Powerhouse Dynamics</li><li><strong>manufacturer_url</strong>: <a href=\"http://powerhousedynamics.com\">http://powerhousedynamics.com</a></li><li><strong>description</strong>: <a href=\"Measures circuit-level power demand.  Comes with 24 CTs. This FAQ page suggests the eMonitor measures real (active) power: http://www.energycircle.com/node/14103  although the REDD readme.txt says all channels record apparent power.\n",
       "\">Measures circuit-level power demand.  Comes with 24 CTs. This FAQ page suggests the eMonitor measures real (active) power: http://www.energycircle.com/node/14103  although the REDD readme.txt says all channels record apparent power.\n",
       "</a></li><li><strong>sample_period</strong>: 3</li><li><strong>max_sample_period</strong>: 50</li><li><strong>measurements</strong>: <ul><li>{'physical_quantity': 'power', 'type': 'active', 'upper_limit': 5000, 'lower_limit': 0}</li></ul></li><li><strong>wireless</strong>: False</li></ul></li><li><strong>REDD_whole_house</strong>: <ul><li><strong>description</strong>: <a href=\"REDD's DIY power meter used to measure whole-home AC waveforms at high frequency.  To quote from their paper: \"CTs from TED (http://www.theenergydetective.com) to measure current in the power mains, a Pico TA041 oscilloscope probe (http://www.picotechnologies.com) to measure voltage for one of the two phases in the home, and a National Instruments NI-9239 analog to digital converter to transform both these analog signals to digital readings. This A/D converter has 24 bit resolution with noise of approximately 70 ÂµV, which determines the noise level of our current and voltage readings: the TED CTs are rated for 200 amp circuits and a maximum of 3 volts, so we are able to differentiate between currents of approximately ((200))(70 Ã— 10âˆ’6)/(3) = 4.66mA, corresponding to power changes of about 0.5 watts. Similarly, since we use a 1:100 voltage stepdown in the oscilloscope probe, we can detect voltage differences of about 7mV.\"\n",
       "\">REDD's DIY power meter used to measure whole-home AC waveforms at high frequency.  To quote from their paper: \"CTs from TED (http://www.theenergydetective.com) to measure current in the power mains, a Pico TA041 oscilloscope probe (http://www.picotechnologies.com) to measure voltage for one of the two phases in the home, and a National Instruments NI-9239 analog to digital converter to transform both these analog signals to digital readings. This A/D converter has 24 bit resolution with noise of approximately 70 ÂµV, which determines the noise level of our current and voltage readings: the TED CTs are rated for 200 amp circuits and a maximum of 3 volts, so we are able to differentiate between currents of approximately ((200))(70 Ã— 10âˆ’6)/(3) = 4.66mA, corresponding to power changes of about 0.5 watts. Similarly, since we use a 1:100 voltage stepdown in the oscilloscope probe, we can detect voltage differences of about 7mV.\"\n",
       "</a></li><li><strong>sample_period</strong>: 1</li><li><strong>max_sample_period</strong>: 30</li><li><strong>measurements</strong>: <ul><li>{'physical_quantity': 'power', 'type': 'apparent', 'upper_limit': 50000, 'lower_limit': 0}</li></ul></li><li><strong>wireless</strong>: False</li></ul></li></ul></li></ul>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "print_dict(redd.metadata)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We also have all the buildings available as an [OrderedDict](https://docs.python.org/2/library/collections.html#collections.OrderedDict) (indexed from 1 not 0 because every dataset we are aware of starts numbering buildings from 1 not 0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<ul><li><strong>1</strong>: Building(instance=1, dataset='REDD')</li><li><strong>2</strong>: Building(instance=2, dataset='REDD')</li><li><strong>3</strong>: Building(instance=3, dataset='REDD')</li><li><strong>4</strong>: Building(instance=4, dataset='REDD')</li><li><strong>5</strong>: Building(instance=5, dataset='REDD')</li><li><strong>6</strong>: Building(instance=6, dataset='REDD')</li></ul>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "print_dict(redd.buildings)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Each building has a little bit of metadata associated with it (there isn't much building-specific metadata in REDD):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<ul><li><strong>instance</strong>: 1</li><li><strong>original_name</strong>: house_1</li><li><strong>dataset</strong>: REDD</li></ul>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "print_dict(redd.buildings[1].metadata)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Each building has an `elec` attribute which is a `MeterGroup` object (much more about those soon!)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "MeterGroup(meters=\n",
       "  ElecMeter(instance=1, building=1, dataset='REDD', site_meter, appliances=[])\n",
       "  ElecMeter(instance=2, building=1, dataset='REDD', site_meter, appliances=[])\n",
       "  ElecMeter(instance=5, building=1, dataset='REDD', appliances=[Appliance(type='fridge', instance=1)])\n",
       "  ElecMeter(instance=6, building=1, dataset='REDD', appliances=[Appliance(type='dish washer', instance=1)])\n",
       "  ElecMeter(instance=7, building=1, dataset='REDD', appliances=[Appliance(type='sockets', instance=1)])\n",
       "  ElecMeter(instance=8, building=1, dataset='REDD', appliances=[Appliance(type='sockets', instance=2)])\n",
       "  ElecMeter(instance=9, building=1, dataset='REDD', appliances=[Appliance(type='light', instance=1)])\n",
       "  ElecMeter(instance=11, building=1, dataset='REDD', appliances=[Appliance(type='microwave', instance=1)])\n",
       "  ElecMeter(instance=12, building=1, dataset='REDD', appliances=[Appliance(type='unknown', instance=1)])\n",
       "  ElecMeter(instance=13, building=1, dataset='REDD', appliances=[Appliance(type='electric space heater', instance=1)])\n",
       "  ElecMeter(instance=14, building=1, dataset='REDD', appliances=[Appliance(type='electric stove', instance=1)])\n",
       "  ElecMeter(instance=15, building=1, dataset='REDD', appliances=[Appliance(type='sockets', instance=3)])\n",
       "  ElecMeter(instance=16, building=1, dataset='REDD', appliances=[Appliance(type='sockets', instance=4)])\n",
       "  ElecMeter(instance=17, building=1, dataset='REDD', appliances=[Appliance(type='light', instance=2)])\n",
       "  ElecMeter(instance=18, building=1, dataset='REDD', appliances=[Appliance(type='light', instance=3)])\n",
       "  ElecMeter(instance=19, building=1, dataset='REDD', appliances=[Appliance(type='unknown', instance=2)])\n",
       "  MeterGroup(meters=\n",
       "    ElecMeter(instance=3, building=1, dataset='REDD', appliances=[Appliance(type='electric oven', instance=1)])\n",
       "    ElecMeter(instance=4, building=1, dataset='REDD', appliances=[Appliance(type='electric oven', instance=1)])\n",
       "  )\n",
       "  MeterGroup(meters=\n",
       "    ElecMeter(instance=10, building=1, dataset='REDD', appliances=[Appliance(type='washer dryer', instance=1)])\n",
       "    ElecMeter(instance=20, building=1, dataset='REDD', appliances=[Appliance(type='washer dryer', instance=1)])\n",
       "  )\n",
       ")"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "redd.buildings[1].elec"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Yup, that's where all the meat lies!"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
